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Abstract 

G\ 

Within the framework of linear vector Gaussian channels with arbitrary signaling, closed-form expressions for 
the Jacobian of the minimum mean square error and Fisher information matrices with respect to arbitrary parameters 
of the system are calculated in this paper. Capitalizing on prior research where the minimum mean square error 
and Fisher information matrices were linked to information-theoretic quantities through differentiation, closed-form 
expressions for the Hessian of the mutual information and the differential entropy are derived. These expressions are 
then used to assess the concavity properties of mutual information and differential entropy under different channel 
conditions and also to derive a multivariate version of the entropy power inequality due to Costa. 

I. Introduction and motivation 

Closed-form expressions for the Hessian matrix of the mutual information with respect to arbitrary parameters 
of the system are useful from a theoretical perspective but also from a practical standpoint. In system design, if 
the mutual information is to be optimized through a gradient algorithm as in [1], the Hessian matrix may be used 
alongside the gradient in the Newton's method to speed up the convergence of the algorithm. Additionally, from a 
^ . system analysis perspective, the Hessian matrix can also complement the gradient in studying the sensitivity of the 
lO | mutual information to variations of the system parameters and, more importantly, in the cases where the mutual 
information is concave with respect to the system design parameters, it can also be used to guarantee the global 
optimality of a given design. 

In this sense and within the framework of linear vector Gaussian channels with arbitrary signaling, the purpose 
O • of this work is twofold. First, we find closed-form expressions for the Hessian matrix of the mutual information, 
. differential entropy and entropy power with respect to arbitrary parameters of the system and, second, we study the 
concavity properties of these quantities. Both goals are intimately related since concavity can be assessed through the 
. ^ ' negative definiteness of the Hessian matrix. As intermediate results of our study, we derive closed-form expressions 
^ . for the Jacobian of the minimum mean-square error (MMSE) and Fisher information matrices, which are interesting 
^ | results in their own right and contribute to the exploration of the fundamental links between information theory 
- - 1 and estimation theory. 

Initial connections between information- and estimation-theoretic quantities for linear channels with additive 
Gaussian noise date back from the late fifties: in the proof of Shannon's entropy power inequality [2], Stam used 
the fact that the derivative of the output differential entropy with respect to the added noise power is equal to the 
Fisher information of the channel output and attributed this identity to De Bruijn. More than a decade later, the 
links between both worlds strengthened when Duncan [3] and Kadota, Zakai, and Ziv [4] independently represented 
mutual information as a function of the error in causal filtering. 

Much more recently, in [5], Guo, Shamai, and Verdii fruitfully explored further these connections and, as their 
main result, proved that the derivative of the mutual information (and differential entropy) with respect to the 
signal-to-noise ratio (SNR) is equal to half the MMSE regardless of the input statistics. The main result in [5] was 
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Fig. 1. Simplified representation of the relations between the quantities dealt with in this work. The Jacobian, D, and Hessian, H, operators 
represent first and second order differentiation, respectively. 

generalized to the abstract Wiener space by Zakai in [6] and by Palomar and Verdu in two different directions: in 
[1] they calculated the partial derivatives of the mutual information and differential entropy with respect to arbitrary 
parameters of the system, rather than with respect to the SNR alone, and in [7] they represented the derivative of 
mutual information as a function of the conditional marginal input given the output for channels where the noise 
is not constrained to be Gaussian. 

In this paper we build upon the setting of [1], where loosely speaking, it was proved that, for the linear vector 
Gaussian channel 



i) the gradients of the differential entropy h(Y) and the mutual information I(S; Y) with respect to functions 
of the linear transformation undergone by the input, G, are linear functions of the MMSE matrix E5 and ii) the 
gradient of the differential entropy h(Y) with respect to the linear transformation undergone by the noise, C, are 
linear functions of the Fisher information matrix, Jy- 

In this work, we show that the previous two key quantities E5 and Jy, which completely characterize the 
first-order derivatives, are not enough to describe the second-order derivatives. For that purpose, we introduce 
the more refined conditional MMSE matrix &s(y) an d conditional Fisher information matrix IV(y) (note that 
when these quantities are averaged with respect to the distribution of the output y, we recover Eg = Ejt&s^)} 
and Jy = E{rV("K)}). In particular, the second-order derivatives depend on 3>s(y) and Ty(v) through the 
following terms: E{$s(^) &> *s(l^)} and E{Ty(Y) (g) Ty{Y)}. See Fig. [Qfor a schematic representation of 
these relations. 

Analogous results to some of the expressions presented in this paper particularized to the scalar Gaussian channel 
were simultaneously derived in [8], [9], where the second and third derivatives of the mutual information with respect 
to the SNR were calculated. 

As an application of the obtained expressions, we show concavity properties of the mutual information and the 
differential entropy, derive a multivariate generalization of the entropy power inequality (EPI) due to Costa in [10]. 
Our multivariate EPI has already found an application in [11] to derive outer bounds on the capacity region in 
multiuser channels with feedback. 

This paper is organized as follows. In Section [TTJ the model for the linear vector Gaussian channel is given and 
the differential entropy, mutual information, minimum mean-square error, and Fisher information quantities as well 
as the relationships among them are introduced. The main results of the paper are given in Section [III] where we 
present the expressions for the Jacobian matrix of the MMSE and Fisher information and also for the Hessian 
matrix of the mutual information and differential entropy. In Section JV] the concavity properties of the mutual 
information are studied and in Section |V] a multivariate generalization of Costa's EPI in [10] is given. Finally, an 
extension to the complex-valued case of some of the obtained results is considered in Section [VI] 

Notation: Straight boldface denote multivariate quantities such as vectors (lowercase) and matrices (uppercase). 
Uppercase italics denote random variables, and their realizations are represented by lowercase italics. The sets of 
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q-dimensional symmetric, positive semidefinite, and positive definite matrices are denoted by S q , Si., and S+ + , 
respectively. The elements of a matrix A are represented by Ay or [A]y interchangeably, whereas the elements 
of a vector a are represented by a^. The operator diag(A) represents a column vector with the diagonal entries of 
matrix A, Diag(A) and Diag(a) represent a diagonal matrix whose non-zero elements are given by the diagonal 
elements of matrix A and by the elements of vector a, respectively, and vecA represents the vector obtained by 
stacking the columns of A. For symmetric matrices, vechA is obtained from vecA by eliminating the repeated 
elements located above the main diagonal of A. The Kronecker matrix product is represented by A <8> B and the 
Schur (or Hadamard) element-wise matrix product is denoted by AoB. The superscripts (-) T , (-)T, and (-) + , denote 
transpose, Hermitian, and Moore-Penrose pseudo-inverse operations, respectively. With a slight abuse of notation, 
we consider that when square root or multiplicative inverse are applied to a vector, they act upon the entries of the 
vector, we thus have [y/a\- = y/&i and [l/a]j = 1/aj. 

II. Signal model 

We consider a general discrete-time linear vector Gaussian channel, whose output Y E R n is represented by the 
following signal model 

Y = GS + Z, (2) 

where S E EL m is the zero-mean channel input vector with covariance matrix R5, the matrix G E R" xm specifies 
the linear transformation undergone by the input vector, and Z E R n represents a zero-mean Gaussian noise with 
non-singular covariance matrix R z . 

The channel transition probability density function corresponding to the channel model in (f21) is 

Py\s(v\a) = p z(y - Gs) = 1 = exp ( --(y - Gs) T R 7 1 (y - Gs) ) (3) 

and the marginal probability density function of the output is given byS 

P Y (y) = E{P Y \s(y\S)}, (4) 

which is an infinitely differentiable continuous function of y regardless of the distribution of the input vector S 
thanks to the smoothing properties of the added noise [10, Section II]. 

At some points, it may be convenient to define the random vector X = GS with covariance matrix given by 
Rx = GR S G T and also express the noise vector as Z = CN, where C E W ixn , such that n' > n, and the 
noise covariance matrix = CRjyC T has an inverse so that © is meaningful. 

With this notation, Py\x{v\ x ) can be obtained by replacing Gs by x in (f3]) and the channel model © can be 
alternatively rewritten as 

Y = GS + CN = X + CAT = GS + Z = X + Z. (5) 

In the following subsections we describe the information- and estimation-theoretic quantities whose relations we 
are interested in. 

A. Differential entropy and mutual information 

The differential entropy^ of the continuous random vector Y is defined as [12, Chapter 9] 

h(Y) = -E{\ogP Y (Y)}. (6) 

For the case where the distribution of Y assigns positive mass to one or more singletons in H n , the above definition 
is usually extended with h(Y) = —00. 

For the linear vector Gaussian channel in ((5]), the input-output mutual information is [12, Chapter 10] 

I(S; Y) = h(Y) - h(Z) 

= h(Y) - -logdet(2vreR z ) = h(Y) - -logdet(27reCRivC T ) . ' ' ' 



'We highlight that in every expression involving integrals, expectation operators, or even a density we should include the statement if it 
sts. 

Throughout this paper we work with natural logarithms and thus nats are used as information units. 
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B. MMSE matrix 

We consider the estimation of the input signal S based on the observation of a realization of the output Y = y. 
The mean square error (MSE) matrix of an estimate S(y) of the input S given the realization of the output Y = y 
is defined as E{(S — S(Y))(S — S(Y)) T } and it gives us a description of the performance of the estimator. 

The estimator that simultaneously achieves the minimum MSE for all the components of the estimation error 
vector is given by the conditional mean estimator S(y) = E{S \y} and the corresponding MSE matrix, referred 
to as the MMSE matrix, is 

E S = E{(S-E{S|F})(S-E{S|F}) T }. (8) 

An alternative and useful expression for the MMSE matrix can be obtained by considering first the MMSE matrix 
conditioned on a specific realization of the output Y = y, which is denoted by and defined as: 

*s(») = E{(5 - E{S\y})(S - E{S\y}) T | y). (9) 

Observe from © that &s(y) is a positive semidefinite matrix. Finally, the MMSE matrix in ([8]) can be obtained 
by taking the expectation in © with respect to the distribution of the output: 

E S = E{* S (^)}. (10) 



C. Fisher information matrix 

Besides the MMSE matrix, another quantity that is closely related to the differential entropy is the Fisher 
information matrix with respect to a translation parameter, which is a special case of the Fisher information matrix 
[13]. The Fisher information is a measure of the minimum error in estimating a parameter of a distribution and is 
closely related to the Cramer-Rao lower bound [14]. 

For an arbitrary random vector Y, the Fisher information matrix with respect to a translation parameter is defined 

as 

J Y = E{Dl\ogP Y (Y)D y \ogP Y (Y)}, (11) 

where D is the Jacobian operator. This operator together with the Hessian operator, H, and other definitions and 
conventions used for differentiation with respect to multidimensional parameters are described in Appendices lAl 
andE 

The expression of the Fisher information in (fTTT ) in terms of the Jacobian of \ogP Y (y) can be transformed into 
an expression in terms of its Hessian matrix, thanks to the logarithmic identity 

H y \ogP Y (y) = " V T y \ogP Y {y)D v \ogP Y {y) (12) 

together with the fact that E{H y P Y (Y) / P Y (Y)} = J H y P Y (y)dy = 0, which follows directly from the 
expression for H y P Y (y) in (11541 ) in Appendix The alternative expression for the Fisher information matrix 
in terms of the Hessian is then 

3 Y = -E{H y \ogP Y (Y)}. (13) 

Similarly to the previous section with the MMSE matrix, it will be useful to define a conditional form of the 
Fisher information matrix T Y (y), in such a way that J Y = E{T Y (Y)}. At this point, it may not be clear which 
of the two forms (ITTb or ( [TBI will be more useful for the rest of the paper; we advance that defining F Y (y) based 
on (fT3l will prove more convenient: 

r Y (y) = -H y \ogP Y (y) = R z x - R z 1 * x (y)R z 1 , (14) 

where the second equality is proved in Lemma IC4l in Appendix Icfl and where we have &x(y) = G$s(y)G T . 



3 Note that the lemmas placed in the appendices have a prefix indicating the appendix where they belong to ease its localization. From 
this point we will omit the explicit reference to the appendix. 
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D. Prior known relations among information- and estimation-theoretic quantities 

The first known relation between the above described quantities is the De Bruijn identity [2] (see also the 
alternative derivation in [5]), which couples the Fisher information with the differential entropy according to 

^h(X + VtZ) = ^TrJy, (15) 

where, in this case Y = X + \fiZ. A multivariate extension of the De Bruijn identity was found in [1] as 

X7ch(X + CAT") = JyCRjv- (16) 

In [5], the more canonical operational measures of mutual information and MMSE were coupled through the 
identity 

^-I{S; V^S + Z)= ^TrE s . (17) 
dsnr v ' 2 

This result was generalized in [1] to the multivariate case, yielding 

V G I(S;GS + Z) =R Z 1 GE S . (18) 

Note that the simple dependence of mutual information on differential entropy established in ©, implies that 
V G I{S; GS + Z) = V G /i(GS + Z). 

From these previous existing results, we realize that the output differential entropy function h(GS + CN) is 
related to the MMSE matrix Eg through differentiation with respect to the transformation G undergone by the 
signal S (see (TT8T )) and is related to the Fisher information matrix Jy through differentiation with respect to the 
transformation C undergone by the Gaussian noise N (see (fT6l)). This is illustrated in Fig. [T] A comprehensive 
account of other relations can be found in [5]. 

Since we are interested in calculating the Hessian matrix of differential entropy and mutual information quantities, 
in the light of the results in (fT6l ) and (TT8T ). it is instrumental to first calculate the Jacobian matrix of the MMSE 
and Fisher information matrices, as considered in the next section. 

III. Jacobian and Hessian results 

In order to derive the Hessian of the differential entropy and the mutual information, we start by obtaining the 
Jacobians of the Fisher information matrix and the MMSE matrix. 

A. Jacobian of the Fisher information matrix 

As a warm-up, consider first the signal model in d5) with Gaussian signaling, Yg = Xg + CAT. In this case, 
the conditional Fisher information matrix defined in ([141 ) does not depend on the realization of the received vector 
y and is (e.g., [14, Appendix 3C]) 

IY S = (Rx g + Rz) 1 = (Rx s + CR N C T y\ (19) 

Consequently, we have that Jy e = E{IV S } = IV S . 

The Jacobian matrix of the Fisher information matrix with respect to the noise transformation C can be readily 
obtained as 

D c Jy e = D Rz Jy s • D c Rz = D Rz (R Xs + Rz)" 1 • DcCR^C 7 (20) 

= -D+(J Vs »Jy s )D n -2D+(CR JV ®I n ) (21) 

= -2D+ ( 3 Yg ® 3 Yg ) (CRat ® In) (22) 

= -2D+E{r Ve ®ry e }(CRjv®In), (23) 

where (l20l follows from the Jacobian chain rule in Lemma IB.5[ in (l2TT > we have applied Lemmas IB.7I6I and IB .7 17 1 
with being the Moore-Penrose inverse of the duplication matrix D n defined in Appendix lAffl : and finally (1221 

4 The matrix D n appears in d23 b and in many successive expressions because we are explicitly taking into account the fact that Jy is a 
symmetric matrix. The reader is referred to Appendices [A] and [B] for more details on the conventions used in this paper. 
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follows from the facts that D n D+ = N n , D+N n = D+, and (A<g> A)N n = N n (A<g> A), which are given in (1132b 
and d 1 29b in Appendix lAl respectively. 

In the following theorem we generalize d23l) for the case of arbitrary signaling. 

Theorem 1 (Jacobian of the Fisher information matrix): Consider the signal model Y = X + CN, where C 
is an arbitrary deterministic matrix, the signaling X is arbitrarily distributed, and the noise vector N is Gaussian 
and independent of the input X. Then, the Jacobian of the Fisher information matrix of the re-dimensional output 
vector Y is 

D C J Y = ~2B + n E {T Y (y) ® T Y (Y)} (CK N ® I n ), (24) 

where T Y (y) is defined in (fT4l . 

Proof: Since 3 Y is a symmetric matrix, its Jacobian can be written as 

D C 3 Y = D c vechJy (25) 

= DcD+vecJy (26) 

= D+DcvecJy (27) 

= D^(-2N„E {T Y (Y)CR N ® T Y (Y)}) (28) 

= -2D^E {T Y (Y) ® T Y (Y)} (CK N ® I n ), (29) 

where (1261 ) follows from ( 11311 ) in Appendix lAl and (1271 ) follows from Lemma |B.7|2j The expression for DcvecJy 
is derived in Appendix |Dj which yields d28l ) and d29l follows from Lemma IA.3I and D+N n = as detailed in 
Appendix |A] ■ 
Remark 1: Due to the fact that, in general, the conditional Fisher information matrix T Y (y) does depend on the 
particular value of the observation y, it is not possible to express the expectation of the Kronecker product as the 
Kronecker product of the expectations, as in (1221) for the Gaussian signaling case, where T Yg does not depend on 
the particular value of the observation y. 



B. Jacobian of the MMSE matrix 

Again, as a warm-up, before dealing with the arbitrary signaling case we consider first the signal model in ([5]) 
with Gaussian signaling, Yg = GSg + Z, and study the properties of the conditional MMSE matrix, &s(y)> 
which does not depend on the particular realization of the observed vector y. Precisely, we have [14, Chapter 11] 

= (R£ 1 + G T R^ 1 G)" 1 (30) 

and thus B Sg = E{$ Ss } = $ Sg 

Following similar steps as in (I20l-(l23l for the Fisher information matrix, the Jacobian matrix of the MMSE 
matrix with respect to the signal transformation G can be readily obtained as 

D G E Se = -2D+ E{* Ss ® * S J (I m ® G 1 ^ 1 ) , (31) 

Note that the expression in (I3TT ) for the Jacobian of the MMSE matrix has a very similar structure as the Jacobian 
for the Fisher information matrix in d23l . The following theorem formalizes the fact that the Gaussian assumption 
is unnecessary for (OTT ) to hold. 

Theorem 2 (Jacobian of the MMSE matrix): Consider the signal model Y = GS + Z, where G is an arbitrary 
deterministic matrix, the m-dimensional signaling S is arbitrarily distributed, and the noise vector Z is Gaussian 
and independent of the input S. Then, the Jacobian of the MMSE matrix of the input vector S is 

D G E S = -2D+ E {<MF) ® &s(Y )} {I m ® G T R Z 1 ) , (32) 

where &s{y) is defined in (©. 

Proof: The proof is analogous to that of Theorem [T] with the appropriate notation adaptation. The calculation 
of DcvecEs can be found in Appendix |E] ■ 
Remark 2: In light of the two results in Theorems Q] and [2l it is now apparent that Y Y (y) plays an analogous role 
in the differentiation of the Fisher information matrix as the one played by the conditional MMSE matrix &s{y) 
when differentiating the MMSE matrix, which justifies the choice made in Section ITl-CI of identifying Y Y (y) with 
the expression in (fT3l and not with the expression in (fTTT) . 
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C. Jacobians with respect to arbitrary parameters 

With the basic results for the Jacobian of the MMSE and Fisher information matrices in Theorems Q] and [2 one can 
easily find the Jacobian with respect to arbitrary parameters of the system through the chain rule for differentiation 
(see Lemma |B3T i. Precisely, we are interested in considering the case where the linear transformation undergone by 
the signal is decomposed as the product of two linear transformations, G = HP, where H represents the channel, 
which is externally determined by the propagation environment conditions, and P represents the linear precoder, 
which is specified by the system designer. 

Theorem 3 (Jacobians with respect to arbitrary parameters): Consider the signal model Y = HPS + CN , 
where H G R raxp , P £ W xm , and C G H nxn ', with n' > n, are arbitrary deterministic matrices, the signaling 
S € R m is arbitrarily distributed, the noise N £ K™ is Gaussian, independent of the input S, and has covariance 
matrix Rjv, and the total noise, defined as Z = CN G ]R n , has a positive definite covariance matrix given by 
Rz = CRjvC t . Then, the MMSE and Fisher information matrices satisfy 

D P E S = -2D+ E {* s (>0 ® * S (Y)} (I m ® P T H T R" 1 H) (33) 

D H E S = -2D+ E {*sOn ® &s(Y)} (P T ® P T H T R Z 1 ) (34) 

D Rz J Y = -D+E {T Y (Y ) ® T Y (Y)} D n (35) 

D RiV Jy = -D+E{IY0n ® ?y(Y)} (C ® C)D n ,. (36) 

Proof: The Jacobians DpEg and DhEs follow from the Jacobian DqEs calculated in Theorem [2] applying 
the following chain rules (from Lemma [B75T ): 

D P E S = D G E S - D P G (37) 
D H E S = D G E S - D H G, (38) 

where G = HP and where DpG = I m ® H and DhG = P T ® I n can be found in Lemma IB.7I1I 
Similarly, the Jacobian Dp z Jy can be calculated by applying 

D C J Y = D Rz J v • D c Rz, (39) 

where DqRz = 2D^(CRat ® In) as in Lemma IB .7 17 1 Recalling that, in this case, the matrix C is a dummy 
variable that is used only to obtain Dr z Jv through the chain rule, the factor (CRjv®In) can be eliminated from 
both sides of the equation. Using D^D n = I n , the result follows. 
Finally, the Jacobian Dr n Jy follows from the chain rule 

Dr n 3 y = D Rz 3 Y ■ D Rn K z = D Rz J Y ■ D+(C ® C)D nS (40) 

where the expression for Dr n % is obtained from Lemma |B.7|3| and where we have used that D^(A® A)D n D^ = 
D+(A® A)N n = D+N n (A® A) =D+(A® A). " ■ 

D. Hessian of differential entropy and mutual information 

Now that we have obtained the Jacobians of the MMSE and Fisher matrices, we will capitalize on the results 
in [1] to obtain the Hessians of the mutual information I(S; Y) and the differential entropy h(Y). We start by 
recalling the results that will be used. 

Lemma 1 (Differential entropy Jacobians [I]): Consider the setting of Theorem [3] Then, the differential entropy 
of the output vector Y, h(Y), satisfies 

D F h(Y) = vec T (H T R z 1 HPE s ) (41) 
D H h(Y) = vec T (R z 1 HPE s P T ) (42) 
D c h{Y) = vec T (J y CRjv) (43) 

On z h(Y) = ivec T (J y )D n (44) 
D RN h(Y) = ivec T (C T J y C)Dv. (45) 
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Remark 3: Note that in [1] the authors gave the expressions (|4TT ) and (I421 for the mutual information. Recalling 
the simple relation © between mutual information and differential entropy for the linear vector Gaussian channel, it 
becomes easy to see that (|4TT ) and (l42l ) are also valid by replacing the differential entropy by the mutual information 
because the differential entropy of the noise vector is independent of P and H. 

Remark 4: Alternatively, the expressions d43l ), (l44l ). and d45l ) do not hold verbatim for the mutual information 
because, in that case, the differential entropy of the noise vector does depend on C, Hz, and Rjv and it has to be 
taken into account. Then, from |7]) and applying basic Jacobian results from [15, Chapter 9], we have 

D C I{S;Y) = D c h(Y) - vec T ((CRjvC^^CRjv) (46) 

D Rz I(S;Y) = D Rz h(Y) - i V ec T (R z 1 )D n (47) 

D Rn I(S;Y) = D RN h(Y) - ivec T (C T (CR iV C T )- 1 C)D n ,. (48) 

With Lemma Q] at hand, and the expressions obtained in the previous section for the Jacobian matrices of the 
Fisher information and the MMSE matrices, we are ready to calculate the Hessian matrix with respect to all the 
parameters of interest. 

Theorem 4 (Differential entropy Hessians): Consider the setting of Theorem [3] Then, the differential entropy of 
the output vector Y, h(Y), satisfies 

H F h(Y) = (E s ® H T R^H) - 2(l m ® H T R^HP)N m E {&s(Y) ® &s(Y)} (l m ® P T H T R^H) 
Hh^(V) = (PE S P T ® R z ) - 2(P ® R z 1 HP)N m E{$s(F) ® & S (Y)} (P T ® P T H T R^ 1 ) 



= (E PS ® R^ 1 ) - 2(l p ® R z 1 H)N p E{$ PS (r) * PS (y)} {l P H T R^) (49) 

H c /i(V") = (Rat ® Jv) - 2(R^C T ® I n )N n E {IY(y) IY(r)} (CR^ ® I n ) (50) 

H Rz ^(y) = -iDTE{r^(y)®r r (Y')}D n (5i) 

H R >(y) = -IdJ, (C t ® C T )E{r y (y) r v (y)} (C ® C)D n ,. (52) 

Proof: See Appendix IB ■ 



Remark 5: The Hessian results in Theorem [4] are given for the differential entropy. The Hessian matrices for 
the mutual information can be found straightforwardly from (|7]) and Remarks [3] and |4] as HpI(S;Y) = Hph(Y), 
H H I(S;Y) = H H h(Y), and 

H C J(5; y) = H c h(Y) + 2(R N C T I n )N n ((CR A rC T )- 1 ® (CRjvC 7 )- 1 ) (CRjv ® I n ) 



- R;v ® (CRatC 7 )- 1 (53) 

H Rz /(5;y) = Hp z /i(y ) + ^(R" 1 ® R~ 1 )D n (54) 

H Rn I(S;Y) = H R „h(Y) + h)l ((C T (CR A ,C T )- 1 C) ® (C T (CR Ar C T )- 1 C))D n ,. (55) 
E. Hessian of mutual information with respect to the transmitted signal covariance 



While in the previous sections we have obtained expressions for the Jacobian of the MMSE and the Hessian of 
the mutual information and differential entropy with respect to the noise covariances Rz and R^r among others, 
we have purposely avoided calculating these Jacobian and Hessian matrices with respect to covariance matrices of 
the signal such as the squared precoder Qp = PP T , the transmitted signal covariance Q = PRgP 7 , or the input 
signal covariance Rg. 

The reason is that, in general, the mutual information, the differential entropy, and the MMSE are not functions 
of Qp, Q, or Rg alone. It can be seen, for example, by noting that, given Qp, the corresponding precoder matrix 
P is specified up to an arbitrary orfhonormal transformation, as both P and PV, with V being orfhonormal, yield 
the same squared precoder Qp. Now, it is easy to see that the two precoders P and PV need not yield the same 
mutual information, and, thus, the mutual information is not well defined as a function of Qp alone because the 
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mutual information can not be uniquely determined from Qp. The same reasoning applies to the differential entropy 
and the MMSE matrix. 

There are, however, some particular cases where the quantities of mutual information and differential entropy 
are indeed functions of Qp, Q, or Rg. We have, for example, the particular case where the signaling is Gaussian, 
S = Sg. In this case, the mutual information is given by 

I(Sg; Yg) = ilogdet(I n + R^HPR^H 7 ), (56) 

which is, of course, a function of the transmitted signal covariance Q = PRsP T , a function of the input signal 
covariance Rg, and also a function of the squared precoder Qp = PP T when Rg = I m . 
Upon direct differentiation with respect to, e.g., Q we obtain [15, Chapter 9] 

D Q I(Sg;Y g ) = ivec T (H T R z 1 H(I p + QB^R- 1 !!) 1 ^, (57) 

which, after some algebra, agrees with the result in [1, Theorem 2, Eq. (23)] adapted to our notation, 

D Q I(S g ; Yg) = ivec T (H T R^ 1 HPEs s R s 1 p- 1 )D p , (58) 

where, for the sake of simplicity, we have assumed that the inverses of P and Rg exist and where the MMSE 
is given by E Ss = (Rg 1 + P T H T R^ 1 HP) x . Note now that the MMSE matrix is not a function of Q and, 
consequently, it cannot be used to derive the Hessian of the mutual information with respect to Q as we have 
done in Section UlI-DI for other variables such as P or C. Therefore, the Hessian of the mutual information for the 
Gaussian signaling case has to be obtained by direct differentiation of the expression in d571 ) with respect to Q, 
yielding [15, Chapter 10] 

H Q I(Sg;Y g ) = ^Uj(((I p + -H T K z 1 UQy 1 U T K z 1 U) ® (H T R^H(I p + H T R^ 1 HQ)" 1 ))D p . (59) 

Another particular case where the mutual information is a function of the transmit covariance matrices is in the 
low-SNR regime [16]. Assuming that Hz = NqI, Prelov and Verdu showed that [16, Theorem 3] 

I{S]Y) = 2^o Tr(HPRsPTRT) " i^ Tr (( HPRspTRT ) 2 ) +0 ( Ar o" 2 ), (60) 

where the dependence (up to terms o(Nq 2 )) of the mutual information with respect to Q = PR,gP T is explicitly 
shown. The Jacobian and Hessian of the mutual information, for this particular case become [15, Chapters 9 and 
10]: 

D Q I(S;Y) = -i-vec T (H T H)D p - -i I vec T (H T HQH T H)D p + (iV ( 7 2 ) (61) 
11\q ZiV 

H Q /(5; Y) = __LdJ(H t H ® H T H)D p + o(N Q 2 ) . (62) 
Zi\ 

Even though we have shown two particular cases where the mutual information is a function of the transmitted 
signal covariance matrix Q = PRgP T , it is important to highlight that care must be taken when calculating the 
Jacobian matrix of the MMSE and the Hessian matrix of the mutual information or differential entropy as, in 
general, these quantities are not functions of Qp, Q, nor Rs. In this sense, the results in [1, Theorem 2, Eqs. 
(23), (24), (25); Corollary 2, Eq. (49); Theorem 4, Eq. (56)] only make sense when the mutual information is well 
defined as a function of the signal covariance matrix (such as the cases seen above where the signaling is Gaussian 
or the SNR is low). 



IV. Mutual information concavity results 

As we have mentioned in the introduction, studying the concavity of the mutual information with respect to 
design parameters of the system is important from both analysis and design perspectives. 

The first candidate as a system parameter of interest that naturally arises is the precoder matrix P in the signal 
model Y = HPS + Z. However, one realizes from the expression Hp/(S I ; Y) in Remark[5]of Theorem[4l that for 
a sufficiently small P the Hessian is approximately Hp/(S I ; Y) m Eg H T R^ 1 H, which, from Lemma IG.3I is 
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positive definite and, consequently, the mutual information is not concave in P (actually, it is convex). Numerical 
computations show that the non-concavity of the mutual information with respect to P also holds for non-small P. 

The next candidate is the transmitted signal covariance matrix Q, which, at first sight, is better suited than the 
precoder P as it is well known that, for the Gaussian signaling case, the mutual information as in (1561 ) is a concave 
function of the transmitted signal covariance Q. Similarly, in the low SNR regime we have that, from d62b , the 
mutual information is also a concave function with respect to Q. 

Since in this work we are interested in the properties of the mutual information for all the SNR range and for 
arbitrary signaling, we wish to study if the above results can be generalized. Unfortunately, as discussed in the 
previous section, the first difference of the general case with respect to the particular cases of Gaussian signaling 
and low SNR is that the mutual information is not well defined as a function of the transmitted signal covariance 
Q only. 

Having discarded the concavity of the mutual information with respect to P and Q, in the following subsections 
we study the concavity of the mutual information with respect to other parameters of the system. 

For the sake of notation we define the channel covariance matrix as Rh = H T R^ 1 H, which will be used in 
the remainder of the paper. 



A. The scalar case: concavity in the SNR 

The concavity of the mutual information with respect to the SNR for arbitrary input distributions can be derived 
as a corollary from Costa's results in [10], where he proved the concavity of the entropy power of a random 
variable consisting of the sum of a signal and Gaussian noise with respect to the power of the signal. As a direct 
consequence, the concavity of the entropy power implies the concavity of the mutual information in the signal 
power, or, equivalently, in the SNR. 

In this section, we give an explicit expression of the Hessian of the mutual information with respect to the SNR, 
which was previously unavailable for vector Gaussian channels. 

Corollary 1 (Mutual information Hessian with respect to the SNR): Consider the signal model Y = ^snrHS+Z, 
with snr > and where all the terms are defined as in Theorem [3] Then, 

H snr /(S;F) = d2/ d ^ 2 y) = -iTrE{(R H * S (F)) 2 }. (63) 

Moreover, H snr I (S;Y) < for all snr, which implies that the mutual information is a concave function with 
respect to snr. 

Proof: First, we consider the result in [1, Corollary 1], 

D snr / (S;Y) = ^TrR H E s . (64) 

Now, we only need to choose P = ^/snrl p , which implies m = p, and apply the results in Theorem @] and the 
chain rule in Lemma IB.5I to obtain 

H snr / (5; Y ) = ~ D snr Tr R H E S (65) 
= ^D Es Tr R H E S • DpE s • D snr P (66) 
= ivec T (R H )D p (-2D+E{* s (Y) ® * S {Y)} (I P ® ^R H ))^^vecI p (67) 



_ snr 

= -ivec T (R H )E{$s0n®$ s (lO}vecRH, ( 68 ) 

where in last equality we have used Lemma IA.4I the equality D p Di" = N p , and the fact that, for symmetric 
matrices, vec T (RH)N p = vec T Rn as in d 128b in Appendix lAl 

From the expression in (I68T ). it readily follows that the mutual information is a concave function of the snr 
parameter because, from Lemma |G3l we have that <&s(y) ® &s(y) > 0> Vy, and, consequently, H snr J (S; Y) < 0. 
Finally, applying again Lemma IA.4I and vec T AvecB = TrA T B, the expression for the Hessian in the corollary 
follows. ■ 

Remark 6: Observe that (l63l agrees with [9, Prp. 5] for scalar Gaussian channels. 
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We now wonder if the concavity result in Corollary [TJ can be extended to more general quantities than the 
scalar SNR. In the following section we study the concavity of the mutual information with respect to the squared 
singular values of the precoder for the simple case where the left singular vectors of the precoder coincide with the 
eigenvectors of the channel covariance matrix Rh, which is commonly referred to as the case where the precoder 
diagonaliz.es the channel. 

B. Concavity in the squared singular values of the precoder when the precoder diagonalizes the channel 

Consider the eigendecomposition of the p x p channel covariance matrix Rh = UHDiag(<r)U^ i , where Uh G 
W xp is an orfhonormal matrix and the vector a G W contains non-negative entries in decreasing order. Note that 
in the case where rank(Rn) =p l < p, the last p — rank(Rn) elements of the vector <x are zero. 

Let us now consider the singular value decomposition (SVD) of the pxm precoder matrix P = UpDiag(\/A) Vp. 
For the case where m > p, we have that Up G Ill pxp is an orthonormal matrix, the vector A is p-dimensional, 
and the matrix Vp G R"" 5 contains orthonormal columns such that VpVp = I p . For the case m < p, the 
matrix Up G W xm contains orthonormal columns such that Up Up = I m , the vector A is m-dimensional, and 
Vp G ]R mxm i s an orthonormal matrix. 

In the following theorem we assume m > p for the sake of simplicity, and we characterize the concavity properties 
of the mutual information with respect to the entries of the squared singular values vector A for the particular case 
where the left singular vectors of the precoder coincide with the eigenvectors of the channel covariance matrix, 
Up = Uh- The result for the case m < p is stated after the following theorem, and is left without proof because 
it follows similar steps. 

Theorem 5 (Mutual information Hessian with respect to the squared singular values of the precoder): Consider 
Y = HPS' + Z, where all the terms are defined as in Theorem [3j for the particular case where the eigenvectors 
of the channel covariance matrix Rh and the left singular vectors of the precoder P G W xm coincide, i.e., 
Up = Uh, and where we have m > p. Then, the Hessian of the mutual information with respect to the squared 
singular values of the precoder A is: 



where we recall that A o B denotes the Schur (or Hadamard) product. Moreover, the Hessian matrix H\I(S;Y) 
is negative semidefmite, which implies that the mutual information is a concave function of the squared singular 
values of the precoder. 

Proof: The Hessian of the mutual information H A I (S; Y) can be obtained from the Hessian chain rule in 
Lemma IB.5I as 



H X I(S;Y) = -~m ag (v)E{$ vls (Y)o$ vls (Y)}m as (v) 



(69) 



H A I (S; Y) = DlP Hp/ (S; Y) D A P + (D P J (S; Y) ® l p ) H A P. 



(70) 



Now we need to calculate D A P and H A P. The expression for D A P follows as 




(71) 
(72) 



= i(Vp®U H )S p (Diag(v / A)) 



(73) 



where, in d72l . we have used Lemmas IA.4I and IB.7I2I and where the last step follows from 



[D A vecDiag(VA)] i+(i _ 1)P;fc = ^t-(\/a7<%) = —j=5ij5 ik , {i,j,k} G [l,p], 



■k 



(74) 



and recalling the definition of the reduction matrix S p in (1133b . [SpJj+^-^p & = 5ij5ik- 

Following steps similar to the derivation of D A P, the Hessian matrix H A P is obtained according to 



H A P = D A (D A P) = iD A ((Diag(VA))^Sj(V P ® U^)) 



(75) 




(77) 



(76) 
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Plugging d73l and ( T77T ) in d70l and operating together with the expressions for the Jacobian matrix Dpi (S; Y) 
and the Hessian matrix Hp/ (S; Y) given in Remark [3] of Lemma Q] and in Remark [5] of Theorem |U respectively, 
we obtain 

H X I(S;Y) =i(Diag(VA))" 1 S p "(E V T S ® Diag(cr) - 2(l p ® Diag(cr o Va))N p 

E{Vj* s (y)V P ® V^*s(r)V P }(l p ® Diag(cr o v/A)))s p (Diag(\/A)) _1 (78) 



- (vec T (Diag (cr o VX) E v t s ) S p ® I p ) S p (Diag (VX) ) " 



^ , vec ' i I Jiaeri/r •:> v Aifiwrcift.. i..irv.i I Jino'i v ai ' 

where it can be noted that the dependence of H^/(S; Y) on Uh has disappeared. 
Now, applying Lemma IA.2I the first term in last equation becomes 

(Diag ( V\) ) _1 Sj (E v t s Diag(<r)) S p (Diag( >/A) ) _1 

= (Diag(^A))- 1 (E vTpS oDiag( < T))(Diag(VA))- 1 (79) 
= E VpS oDiag(cro(l/A)), (80) 

whereas the third term in (|78T ) can be expressed as 

(vec T (Diag (cr o VA)E v t s )S p ® I m )S p (Diag(\/X)) -3 

= Diag(Diag(cro VA)E VpS )(Diag(VA)) -3 (81) 
= E VpS oDiag(cro(l/A)), (82) 

where in (I8TT ) we have used that, for any square matrix A G W xp , 

v 

[veC T (A)S p ]fc = ^2 A ij S ij 5 ik = A kk (83) 

i,i=i 
p 

[(diag(A) T ® I p )S p ] fc/ = ^ Ajj8kikj$il = Mihi- (84) 

*>j 

Now, from (l80l and (l82l we see that the first and third terms in (1781) cancel out and, recalling that Vp$g(Y)Vp = 
^ v t >s (Y), the expression for the Hessian matrix H\I(S;Y) simplifies to 

H A J(5; Y) = — ^(Diag(^))- 1 E{$ VTpS (F) o Diag(<x o VX) * V T pS (F)Diag( < r o V\) 

4 (8j) 
+ Diag(<r o VA)* VpS (F) o $ VpS (Y)Diag(cr o V\) } (Diag( VX)) X , 

where we have applied Lemma IA.2I and have taken into account that 

2(I P <g> Diag (cr o v/a))N p = (l p ® Diag (cr o \/A)) + K p (Diag (cr o \/a) ®I p )), (86) 

together with SjKp = Sj. Now, from simple inspection of the expression in (l85l ) and recalling the properties of 
the Schur product, the desired result follows. ■ 

Remark 7: Observe from the expression for the Hessian in d69l that for the case where the channel covariance 
matrix Rh is rank deficient, rank(Rn) = p' < p, the last p — p' entries of the vector cr are zero, which implies 
that the last p — p' rows and columns of the Hessian matrix are also zero. 

Remark 8: For the case where m < p, note that the matrix Up G R pxm with the left singular vectors of the 
precoder P is not square. We thus consider that it contains the m eigenvectors in Uh associated with the m largest 
eigenvalues of Rh- In this case, the Hessian matrix of the mutual information with respect to the squared singular 
values A G R m is also negative semidefinite and its expression becomes 

H A I(S;F) = -iDiag(c>)E{* V T S (y)o* V T S (y)}Diag(c>), (87) 

where we have defined cr = (<ti<72 . . . c m ) T and where we recall that, in this case, Vp G M mxm . 
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We now recover a result obtained in [17] were it was proved that the mutual information is concave in the power 
allocation for the case of parallel channels. Note, however, that [17] considered independence of the elements in 
the signaling vector S, whereas the following result shows that it is not necessary. 

Corollary 2 (Mutual information concavity with respect to the power allocation in parallel channels): Particu- 
larizing Theorem [5] for the case where the channel H, the precoder P, and the noise co variance Hz are diagonal 
matrices, which implies that Up = Uh = I p , it follows that the mutual information is a concave function with 
respect to the power allocation for parallel non-interacting channels for an arbitrary distribution of the signaling 
vector S. 



C. General negative results 

In the previous section we have proved that the mutual information is a concave function of the squared singular 
values of the precoder matrix P for the case where the left singular vectors of the precoder P coincide with the 
eigenvectors of the channel correlation matrix, Rh- For the general case where these vectors do not coincide, the 
mutual information is not a concave function of the squared singular values of the precoder. This fact is formally 
established in the following theorem through a counterexample. 

Theorem 6 (General non-concavity of the mutual information): Consider Y = HPS' + Z, where all the terms 
are defined as in Theorem [3] It then follows that, in general, the mutual information is not a concave function with 
respect to the squared singular values of the precoder A. 

Proof: We present a simple two-dimensional counterexample. Assume that the noise is white Hz = I2 and 
consider the following channel matrix and precoder structure 



«"-() l)- Pc« = Diag(VA)= ^ jLj, (88) 



where (3 G (0, 1] and assume that the distribution for the signal vector S has two equally likely mass points at the 
following positions 

^>-(S). (89) 
Accordingly, we define the noiseless received vector as = ~H. ce P ce s^ k \ for k = {1,2}, which yields 



,(1) = n( \ ( 2 ) _ ( 0V% 



We now define the mutual information for this counterexample as 

I ce (Ai,A 2 ,/3) = I(S;H ce P ce S + W). (91) 

Since there are only two possible signals to be transmitted, and s^, it is clear that < I ce < log2. Moreover we 
will use the fact that, as Hz = I2, the mutual information is an increasing function of the squared distance of the two 
only possible received vectors d 2 (Ai, A2, /?) = ||r^— r^ 2 ' || 2 , which is denoted by J ce (Ai, A2, 0) = /( d 2 (Ai, A2, /?)) , 
where / is an increasing function. 

For a fixed (3, we want to study the concavity of I ce (Ai, A2, (3) with respect to (Ai,A2). In order to do so, we 
restrict ourselves to the study of concavity along straight lines of the type Ai + A2 = p, with p > 0, which is 
sufficient to disprove the concavity. 

Given three aligned points, such that the point in between is at the same distance of the other two, if a function 
is concave in (Ai,A2) it means that the average of the function evaluated at the two extreme points is smaller 
than or equal to the function evaluated at the midpoint. Consequently, concavity can be disproved by finding three 
aligned points, such that the aforementioned concavity property is violated. 

Our three aligned points will be (p, 0), (0, p), and (p/2, p/2) and instead of working with the mutual information 
we will work with the squared distance among the received points because closed form expressions are available. 

Operating with the received vectors and recalling that (3 € (0, 1], we can easily obtain 

d 2 (p, 0, (3) = d 2 (0, p, (3) = Ml + (3 2 ) > Ap (92) 
d 2 (p/2, p/2, (3) = 4p(l - f3f < 4p. (93) 
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Fig. 2. Graphical representation of the mutual information / ce (Ai, A2, (3) in the counterexample for different values of the channel parameter 
(3 along the line Ai + A2 = p = 10. It can be readily seen that, except for the case j3 = 0, the function is not concave. 



The first equality means that the mutual information evaluated at the extreme points has the same quantitative value 
and is always above a certain threshold, /(4p), independently of the value of (3. Consequently the mean of the 
mutual information evaluated at the two extreme points is equal to the value on any of the extreme points. The 
second equality means that the function evaluated at the point in between is always below this same threshold. 
Now it is clear that, given any p > we can always find (3 such that < (3 < 1 and that 

I ce (p, 0, 0) = J ce (0, p, 13) > I ce (p/2, p/2, (3), (94) 

which contradicts the concavity hypothesis. ■ 
For illustrative purposes, in Fig. we have depicted the mutual information for different values of the channel 
parameter (3 for the counterexample in the proof of Theorem [6] Note that the function is only concave (and, in 
fact, linear) for the case where the channel is diagonal, (3 = 0, which agrees with the results in Theorem [5] 

D. Concavity results summary 

At the beginning of Section JV] we have argued that the mutual information is concave with respect to the full 
transmitted signal covariance matrix Q for the case where the signaling is Gaussian and also for the low SNR 
regime. Next we have discussed that this result cannot be generalized for arbitrary signaling distributions because, 
in the general case, the mutual information is not well defined as a function of Q alone. 

In Sections llV-AI and UV-Bl we have encountered two particular cases where the mutual information is a concave 
function. In the first case, we have seen that the mutual information is concave with respect to the SNR and, in the 
second, that the mutual information is a concave function of the squared singular values of the precoder, provided 
that the eigenvectors of the channel covariance Rh and the left singular vectors of the precoder P coincide. For 
the general case where these vectors do not coincide in general, we have shown in Section IIV-CI that the mutual 
information is not concave in the squared singular values. 

A summary of the different concavity results for the mutual information as a function of the configuration of the 
linear vector Gaussian channel can be found in Table U 
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TABLE I 

Summary of the concavity type of the mutual information. 
(/ indicates concavity, x indicates non-concavity, and — indicates that it does not apply) 



Cases 


Scalar 


Vector 


Matrix 


Power, snr, 
ir = ysnrl 


Squared singular values, A, 
ir = U piJiaglv AJ V p 


Transmit covariance, Q, 

I \ TD"D I > ' 

14 = Jr'JrtsJr' 


General case: Y = HPS' + Z 


/ [5] [10] 
(Section llV-Al 


x (Section \W-C\ 


- (Section |IE-E| 


Channel covariance Rh and precoder P 
share singular/eigenvectors: Up = Uh 


/ 


/ (Section ]1V-BJ 


- (Section |uT-E) 


Independent parallel communication: 
Rh = Up = V P = I, P s = Hi p 3i 


/ 


/ [17] 


/ [17] 
(Note that Q is diagonal) 


Low SNR regime: R z = JVoIn, No > 1 


/ 


/ 


/ [16] 


Gaussian signaling: S = Sg 


/ 


/ 


/ 



V. Multivariate extension of Costa's entropy power inequality 

Having proved that the mutual information and, hence, the differential entropy are concave functions of the 
squared singular values A of the precoder P for the case where the left singular vectors of the precoder coincide 
with the eigenvectors of the channel covariance Rh, H^7(,S; Y) = H\h(Y) < 0, we want to study if this last 
result can be strengthened by proving the concavity in A of the entropy power. 

The entropy power of the random vector Y G B™ was first introduced by Shannon in his seminal work [18] and 
is, since then, defined as 

N0O = ±ev(lhtX)), (95) 

where h(Y) represents the differential entropy as defined in ©. The entropy power of a random vector Y represents 
the variance (or power) of a standard Gaussian random vector Yg ~ J\f (0, a 2 I n ) such that both Y and Yg have 
identical differential entropy, h{Yg) = h{Y). 

Costa proved in [10] that, provided that the random vector W is white Gaussian distributed, then 

N(X + VtW) > (l-t)N(X) + tN(X + W), (96) 

where t G [0,1]. As Costa noted, the above entropy power inequality (EPI) is equivalent to the concavity of the 
entropy power function N(X + y/tW) with respect to the parameter t, or, formall}|f] 

d 2 

-^N(X + y/tW) = H t 7Y(X + x/iW) < 0. (97) 

Due to its inherent interest and to the fact that the proof by Costa was rather involved, simplified proofs of his 
result have been subsequently given in [19]-[22]. 

Additionally, in his paper Costa presented two extensions of his main result in d97l ). Precisely, he showed that 
the EPI is also valid when the Gaussian vector W is not white, and also for the case where the t parameter is 
multiplying the arbitrarily distributed random vector X instead: 

H t N(ViX + W) < 0. (98) 

Observe that \/t in (|98T ) plays the role of a scalar precoder. We next consider an extension of (|98T ) to the case 
where the scalar precoder y/i is replaced by a multivariate precoder P G W xm and a channel H G IR nxp for the 
particular case where the precoder left singular vectors coincide with the channel covariance eigenvectors. Similarly 
as in Section IIV-BI we assume that m > p. The case m < p is presented after the proof of the following theorem. 

Theorem 7 (Costa's multivariate EPI): Consider Y = HPS' + Z, where all the terms are defined as in Theorem 
[3] for the particular case where the eigenvectors of the channel covariance matrix Rh coincide with the left singular 

5 The equivalence between equations l l97l l and l l96l is due to the fact that the function N(X + y/iW) is twice differentiable almost 
everywhere thanks to the smoothing properties of the added Gaussian noise. 
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vectors of the precoder P G MP xm and where we assume that m > p. It then follows that the entropy power N(Y) 
is a concave function of A, i.e., 

H X N(Y) < 0. 

Moreover, the Hessian matrix of the entropy power function N(Y) with respect to A is given by 

H X N(Y) = ^p-Bmg(cT) ^ dia g( E v;s)diag(E VpS ) T _ £ j^ (F) Q ^^y)}^ Diag(<x), (99) 

where we recall that diag(E V T s ) is a column vector with the diagonal entries of the matrix E v t s . 

Proof: First, let us prove d99b . From the definition of the entropy power in d95l ) and applying the chain rule 
for Hessians in (11431 ) we obtain 

H X N(Y) = D T x h(Y) ■ H h[Y) N(Y) ■ D x h(Y) + D h{Y) N(Y) • H x h(Y) 

2N(Y) f2Dlh(Y)D x h(X) + H ( 10 °) 



n \ n 

Now, recalling from [5, Eq. (61)] that D^h(Y) = (l/2)Diag(<x)diag(E V T s ) and incorporating the expression 
for H x h(Y) calculated in Theorem [2 the result in (1991 follows. 

Now that a explicit expression for the Hessian matrix has been obtained, we wish to prove that it is negative 
semidefinite. Note from (1 100b that, except for the positive factor 2N(Y)/n, the Hessian matrix H X N(Y) is the 
sum of a rank one positive semidefinite matrix and the Hessian matrix of the differential entropy, which is negative 
semidefinite according to Theorem [5] Consequently, the definiteness of H X N(Y) is, a priori, undetermined, and 
some further developments are needed to determine it, which is what we do next. 

First consider the positive semidefinite matrix A(y) G $+, which is obtained by selecting the first p' = rank(Rn) 
columns and rows of the positive semidefinite matrix Diag(y / er)<I> V T S (y)Diag(y / <T), 

[A(y)] . = [Diag(^)* V T S (F)Diag(^ : )] u ., {i,j} = l,...,p'. (101) 

With this definition, it is now easy to see that the expression 

E{diag(A(y))| EldiagfAfy))" 1 ") 

— i SV yy " J 1 sv yy> > I - E{A(^)o A(y)} (102) 

coincides (up to the factor 2N(Y)/n) with the first p' rows and columns of the Hessian matrix H X N{Y) in 
d99b - Recalling that the remaining elements of the Hessian matrix H X N{Y) are zero due to the presence of the 
matrix Diag(er), it is sufficient to show that the expression in (11021 ) is negative semidefinite to prove the negative 
semidefiniteness of Y\ X N{Y). 

Now, we apply Proposition IG.9I to A(y), yielding 

A( } > diag(A(y))diag(A(y))^ 

p> 

Taking the expectation in both sides of (1 103b . we have 

E |diag(A(lO)diag(A(in) T ) 

E {A(Y) o ACY)} > — SV 1 — SV V 77 J , (104) 

p' 

From Lemma IG.10I we know that 

E{diag(A(Y))diag(A(F)) T } > E{diag(A(F))}E{diag(A(F)) T }, 
from which it follows that 

E {diag(A(l0)} E {diag(A(F)) T } 



E{A(Y) o A(V)} > 



p> 



Since the operators diag(A) and expectation commute we finally obtain the desired result as 

E {ACY) o ACY)} > di *g( E {MY)})dmg(E{A(Y)}) T ^ diag(E{A(y)})diag(E{A(y)}) T _ 

~ p' ~ n 
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where in last inequality we have used that p' = rank(Rn) < min{n,p} < n, as Rh = F^R^H and H E R nxp . 

m 

Remark 9: For the case where m < p, we assume that the matrix Up E ]R, pxm contains the m eigenvectors in 
Uh associated with the m largest eigenvalues of Rh- It then follows that the Hessian matrix H\N(Y) is also 
negative semidefinite and its expression is the same given in (l99l) by simply replacing er by a = {<j\ . . . a m ) T . 

Remark 10: For the case where Rh = I p and p = m we recover our results in [23]. 

Another possibility of multivariate generalization of Costa's EPI would be to study the concavity of N(X + Z) 
with respect to the covariance of the noise vector Hz- Numerical computations seem to indicate that the entropy 
power is indeed concave in Hz- However, a proof has been elusive, mainly due to the fact that, differently from 
the conditional MSE, $5(2/), the conditional Fisher information matrix ry(j/), which appears when differentiating 
with respect to Hz, is not a positive definite function Vy. 



VI. Extensions to the complex field 

So far, the presented results hold for the case where all the variables and parameters take values from the field 
of real numbers. Due to the simplicity of working with baseband equivalent models, it is a common practice when 
studying communication systems to model the parameters and random variables in the complex field, and work 
with the following complex linear vector Gaussian channel: 



G C S C + Z c 



(105) 



where G c 6 C nxm and all the other dimensions are defined accordingly and the noise Z c is a zero mean circularly 
symmetric (or proper [24]) complex Gaussian random vector with covariance E{Z c Zj} = Rz c - The complex 
model in (1105b can be equivalently rewritten by defining an extended double-dimensional real model of (1 105b - We 
consider the extended vectors and matrices 



9mY f 



5ReG c 



KeS c 
QmS c 



-3mG c 
5teG c 

and then rewrite the input-output relation in (11051 ) according to the real model 

Y r = G r S r -\- Z r . 



Z r 



ReZ c 
9mZ, 



(106) 



(107) 



With these definitions, we have that, for example, h(Y c ) = h(Y r ) or I(S C ;Y C ) = I(S r ;Y r ) [24]. 

Working with the real model in (11071 ). it is possible to calculate, for example, the Jacobian of the mutual 
information with respect to the complex precoder G c by using the results for the real case and the chain rule as 



D G J(S C ;Y C 



D G J(S r ;Y r ) ^ ^(D^ G J(S r ;Y r 



- jD GmG J (S r ;Y r )) 



1 



[D Gr I(S r ; Y r )DsR eGc G r -jD Gr I(S r ; Y r )D^ mG G r ) , 



(108) 
(109) 



where we have used the convention for the complex derivative defined in [25] and where the Jacobians D^ eGc G r 
and DQ m G e G r can be found using the definition in (11061 ) and the results in [15, Chapter 9]. Similarly, expressions 
for H Gc I(S c ; Y c ) or Hg;/(S' c ;1^ c ) can also be obtained by successive application of the complex derivative 
definition and the chain rule. 

In the following we present a simplified complex counterpart of the Hessian result in Theorem [5] for the real 
case, which, despite its simplicity, illustrates the particularities of the complex case. 

Theorem 8 (Mutual information Hessian in the complex case): Consider the complex signal model Y c = 
Diag(y / A^) S c + W c , where Diag(y / A^) € is an arbitrary deterministic diagonal matrix (A c G !R n ), the 
signaling S c G C n is arbitrarily distributed, and the noise vector W c G C n follows a white Gaussian proper 
distribution and is independent of the input S c . Then, the differential entropy of the output vector Y c , h(Y), 
satisfies 



H x h(Y c ) = -E {* Sc 0n o &* Sc (Y) + * Sc (F) o ** Sc (F)} 



(110) 
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where we have defined 



* Se (y) = E {(S c - E {S c I y}) (5 C - E {S c \ y}^ | y) 
¥ Sc (y) = E |(5 C - E{5 C | y}) (5 C - E{S C \ y}) T | y] . 
Proof: The real extended model of Y c = Diag(v / A^)£ c + W c is readily obtained as 



Y r = Vmg(y r \ r )S r + N 



Diag(VA^ 








Diag( 



5fteS c 
9mSv 



(111) 
(112) 

(113) 



where now we have E{W r Wj} = Ian- 

Now, applying the chain rule for Xj = [Aj Aj] the elements of the Hessian matrix read as 



d 2 h(Y c 



d 2 h(Y T 



+ 



d 2 h(Y T 



+ 



d 2 h(Y r ) 



d 2 h(Y r 



%J d\ Ct id\ C) j d\ r ^dX r j d\ rj i+ n d\ r j dX r ^dX r j^- n d\ r ^+ n d\ r> j+ n 

The four terms in the complex Hessian can be identified with the elements of the Hessian for the real case, which 
thanks to Theorem [5] can be written as 

d 2 h(Y r ) 



-2E{(E{(S rji - E{S r ,i | Y}) (S rd -E{S rJ I Y}) | Y}) 2 } . (114) 
Noting that S r i = KeS' Cj j and SV,i+n = SmS^j, we can finally write 

[H x h{Y c )] l3 = ~ 2E {(E{Ke (S c ,i - E{S^ \ Y}) Ke(S cJ - E{S cJ \ Y}) \ Y}) 2 } (115) 

- 2E { (E {Ke (S c ,i - E {S c , t \ Y}) 9m (S cJ - E {S cJ \ Y}) \ Y}) 2 } (116) 

- 2E { (E {9m fid - E {S^ \ Y}) Ke (S cJ - E {S cJ \ Y}) \ Y}) 2 } (1 17) 

- 2E { (E {9m (S Cti - E {S c , t \ Y}) 9m (S cJ - E {S C}j \ Y}) \ Y}) 2 } . (118) 

Now, with the definitions in (II 121 ) and (II 1 II ) and a slight amount of algebra, the result follows. ■ 
It is important to highlight that, whereas in the real case the conditional MMSE matrix <&s(y) was enough to 
compute the Hessian, in the complex case, in addition to the conditional MMSE matrix (as defined in (lllll )) there 
is an extra matrix <&g_,(y) defined as in (II 12b . and which is referred to as the conditional pseudo-MMSE matrix. 



Appendix 

A. The commutation K 9jr , symmetrization N g , duplication D g , and reduction S q matrices. 

In this appendix we present four matrices that are very important when calculating Hessian matrices. The 
definitions of the commutation K. qi1 ., symmetrization N 9 , and duplication D g matrices have been taken from [15] 
and the reduction matrix S q has been defined by the authors of the present work. 

Given any matrix A £ H qxr , the two vectors vecA and vecA T contain the same elements but arranged in a 
different order. Consequently, there exists a unique permutation matrix K ?)7 . 6 iggrxgr independent of A, which is 
called commutation matrix, that satisfies 

vecA T = K^vecA, and Kj r = K~£ = K r>g . (119) 

It is easy to verify that the entries of the commutation matrix satisfy 

[K q>r } i+(j _ 1)rti , +u ,_ 1)q = Si-jSfi, {i',j} G [l,q], {i,f} e [l,r]. (120) 

The main reason why we have introduced the commutation matrix is due to the property from which it obtains 
its name, as it enables us to commute the two matrices of a Kronecker product [15, Theorem 3.9], 

K S , 9 (A <g> B) = (B <g> A)K t , r , (121) 

where we have considered A G M, qxr and B G M J sxt . 



19 



We also define K g = K g s for the case where the commutation matrix is square. An important property of the 
square matrix K g is given in the following lemma. 
Lemma A.l: Let A G W xr and B G W xt . Then, 

[A <g> B] i+ , g fc+ (/_!) t = A^B ifc , 
rK-CA^nM a r {*.J>e 1,9, fc€ l,t, Z€ l,r, (122) 

Proof: The equality for the entries of the product A (g> B follows straightforwardly from the definition [26, 
Section 4.2]. Concerning the entries of K g (A (g) B), we have 

q q 

[K q (A ® B)] i+{j _ lHk+{l _ 1)t = ^Y1 [K«W-i)g,W-i) 9 t A ® B]^,.^^ (123) 

i'=ij'=i 

= 2 S 8i'jfyiA>3>iBi, k (124) 
i'=ij'=i 

= AaBjfc, (125) 

where we have used the expression for the elements of K. q in (1120b . ■ 
When calculating Jacobian and Hessian matrices, the form I q + K 9 is usually encountered. Hence, we define the 
symmetrization matrix N g = ^(I q + K 9 ), which is singular and has the following properties 

N, = Nj = N 2 q (126) 
N, = N q K q = K q N q . (127) 

The name of the symmetrization matrix comes from the fact that given any square matrix A G M 1 qxq , then 

NgvecA = i (vecA + vecA T ) = ivec(A + A T ) . (128) 

The last important property of the symmetrization matrix is 

N,(A®A) = (A®A)N„ (129) 

which follows from the definition N g = ^(I q + K g ) together with (1121b . 

Another important matrix related to the calculation of Jacobian and Hessian matrices, specially when symmetric 
matrices are involved, is the duplication matrix D q . Given any symmetric matrix R G S 9 , we denote by vechR the 
\q(q + 1) -dimensional vector that is obtained from vecR by eliminating all the repeated elements that lie strictly 

above the diagonal of R. Then, the duplication matrix T) q E M, q2x 2^+1) Mfills [15, Section 3.8] 

vecR = DyvechR, (130) 

for any (/-dimensional symmetric matrix R. The duplication matrix takes its name from the fact that it duplicates 
the entries of vechR which correspond to off-diagonal elements of R to produce the elements of vecR. 
Since D g has full column rank, it is possible to invert the transformation in (11301 ) to obtain 

vechR = D+vecR = (DjD 9 ) _1 Dj"vecR. (131) 

The most important properties of the duplication matrix are [15, Theorem 3.12] 

K 9 D Q = D 9 , N 9 D 9 = D g , D,D+ = N g , D+N g = D+. (132) 

The last one of the matrices introduced in this appendix is the reduction matrix S q G R q2xq . The entries of the 
reduction matrix are defined as 

[S 3 ]i + (,-_i) 9) jfc = 5ij5 ik = 5 ijk , {i,j, k} G [l,q] (133) 
from which it is easy to verify that the reduction matrix fulfills 

K q S q = S q , N q S q = S q . (134) 
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However, the most important property of the reduction matrix is that it can be used to reduce the Kronecker product 
of two matrices to their Schur product as it is detailed in the next lemma. 
Lemma A.2: Let A G R qxr , B 6 ]R<? xr . Then, 

Sj(A®B)S r = Sj(B®A)S r = AoB. (135) 

Proof: From the expression for the elements of the Kronecker product in Lemma IA.1I and the expression for 
the elements of the reduction matrix we have that, for any i € [1, q] and j G [1, r], 



[Sg(A (g) B)S r ] = ^[Sg] fc+ (;_ 1 ) 9j j[ A ® &]k+(l-l)q,k>+(l>-l)r[Sr]k>+(l>~l)r,j (136) 
<7 r 

= ^ ^ Ml'Rkk'Sklidk'l'j (137) 

= A^Bjj, (138) 

from which the result in the lemma follows. ■ 
Finally, to conclude this appendix, we present two basic lemmas concerning the Kronecker product and the vec 
operator. 

Lemma A. 3: Let A, B, F, and T be four matrices such that the products AB and FT are defined. Then, 
(A (g> F)(B T) = AB (g> FT. 

Proof: See [15, Chapter 2]. ■ 
Lemma A.4: Let A, T, and B be three matrices such that the product ATB is defined. Then, 

vec(ATB) = (B T <g> A)vecT. (139) 

Proof: See [15, Theorem 2.2] or [27, Proposition 7.1.6]. ■ 



B. Conventions used for Jacobian and Hessian matrices 

In this work we make extensive use of differentiation of matrix functions \I/ with respect to a matrix argument T. 
From the many possibilities of displaying the partial derivatives d r ^ s t/dTij ■ ■ ■ dT^i, we will stick to the "good 
notation" introduced by Magnus and Neudecker in [15, Section 9.4] which is briefly reproduced next for the sake 
of completeness. 

Definition B.l: Let * be a differentiable q x t real matrix function of an r x s matrix of real variables T. The 
Jacobian matrix of * at T = To is the qt x rs matrix 

o>vec*(T) 



DT * (T » ) = i^fr T ^,- (140) 

Remark B.2: To properly deal with the case where * is a symmetric matrix, the vec operator in the numerator 
in (1140b has to be replaced by a vech operator to avoid obtaining repeated elements. Similarly, vech has to replace 
vec in the denominator in (11401 ) for the case where T is a symmetric matrix. For practical purposes, it is enough 
to calculate the Jacobian without taking into account any symmetry properties and then add a left factor to 
the obtained Jacobian when * is symmetric and/or a right factor D r when T is symmetric. This proceeding will 
become more clear in the examples given below. 

Definition B.3: Let * be a twice differentiable q x t real matrix function of an r x s matrix of real variables T. 
The Hessian matrix of * at T = Tq is the qtrs x rs matrix 



H T *(T )= D T (D^*(T)) 



vec 



t=t <9(vecT) T V <9(vecT) T 



<9vec*(T xx 1 



(141) 

T=T 

One can verify that the obtained Hessian matrix for the matrix function is the stacking of the qt Hessian matrices 
corresponding to each individual element of vector vecSIj. 

Remark B.4: Similarly to the Jacobian case, when * or T are symmetric matrices, the vech operator has to 
replace the vec operator where appropriate in (1141b - 
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One of the major advantages of using the notation of [15] is that a simple chain rule can be applied for both the 
Jacobian and Hessian matrices, as detailed in the following lemma. 

Lemma B.5 ( [15, Theorems 5.8 and 6.9]): Let T be a twice differentiable u x v real matrix function of a q x t 
real matrix argument. Let ^ be a twice differentiable q xt real matrix function of an r x s matrix of real variables 
T. Define fi(T) = T(*(T)). The Jacobian and Hessian matrices of Q(T) at T = T are: 

D T ft(T ) = (D*T(*o))(Dt*(To)) (142) 
H T fi(T ) = (luv ® Dt*(To)) t H*Y(*o)(Dt*(T )) + (D*Y(* ) ® I rs )H T *(T ), (143) 

where * = *(T ). 

The notation introduced above unifies the study of scalar (q = t = 1), vector (t = 1), and matrix functions of 
scalar (r = s = 1), vector (s = 1), or matrix arguments into the study of vector functions of vector arguments 
through the use of the vec and vech operators. However, the idea of arranging the partial derivatives of a scalar 
function of a matrix argument ip(T) into a matrix rather than a vector is quite appealing and sometimes useful, so 
we will also make use of the notation described next. 

Definition B.6: Let ip be differentiable scalar function of an r x s matrix of real variables T. The gradient of V> 
at T = Tq is the r x s matrix 



Vt^(To) = || 



(144) 

T=T 



It is easy to verify that D T V>(T ) = vec T V T ip(T ). 

We now give expressions for the most common Jacobian and Hessian matrices encountered during our develop- 
ments. 

Lemma B.7: Consider A G H qxr , T G W xs , B G W xt , R G %%, and f G W xl , such that f is a function of T. 
Then, the following holds: 

= ATB, we have D T * = (B T <g> A). If, in addition, B is a function of T, then we have D T * = 
® A) + (I t ® AT)D T B. 
= Af, we have D T * = AD T f- 

= ATA T , with T being a symmetric matrix, we have Dt* = Dj(A (g> A)D r . 
= B T T T A T , we have D T * = (A ® B T )K rjS . 

= (T ® A), we have D T * = (I s (g) K t>r <g> I g )(I rs (g> vecA) and if * = (A ® T), we have D T * = 
K s>g (8) I r )(vecA (g) I rs ), where in this case we have assumed that A G M! }Xt . 
= T _1 , we have Dx* = — (T T (g> T) , where T is a square invertible matrix. 
= ATRT T A T , we have D T * = 2D+(ATR ® A). 

= ATRT T A T , we have H T * = 2(D+ ® I rs )(I q ® K g:S ® I r ) (A ® R ® vecA T )K rjS . 
Proof: The identities from[T]) to |T|) can be found in [15, Chapter 9]. Concerning identity [8]), it can be calculated 
through the definition of the Hessian as 

H T (ATRT T A T ) = 2D T (D+(ATR® A)) T = 2D T ((RT T A T A T )D+ T ). (145) 

Now, we define T = RT T A T and Q = T (g> A T and apply the chain rule twice to obtain 

D T ((RT T A T ® A T )D+ T ) = D n (ftD+ T )D T ftD T r, (146) 

from which the result in [8) follows by application of identities Q]), EJ), and 0]) and from Lemma IA.3I ■ 
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C. Differential properties of the quantities Py{d), -fV|s(?/l s )» an d ^-{S \ y}. 

In this appendix we present a number of lemmas which are used in the proofs of Theorems [J and [2] in Appendices 
El and E 

In the proofs of the following lemmas we interchange the order of differentiation and expectation, which can be 
justified following similar steps as in [1, Appendix B]. 
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Lemma C.l: Let Y = X + CAT", where X is arbitrarily distributed and where AT is a zero-mean Gaussian 
random variable with covariance matrix Rjy and independent of X. Then, the probability density function Py{y) 
satisfies 

VciV(y) = H y P y (y)CR iV . (147) 

Proof: First, we recall that Py{y) = E {P Y \x(y\X)}. Thus the matrix gradient of the density Py(y) with 
respect to C is VciV(y) = E {Vc-fV|x(2/l-^0}- The computation of the inner the gradient Vc-Py|x(y|-^Q can 
be performed by replacing Gs by a; in ([3]) together with 

V c a T (CRArC T ) _1 a = -2(CR iV C T ) _1 aa T (CR Ar C T ) _1 CRjv (148) 
V c det(CR iV C T ) = 2det(CR iV C T )(CRjvC T ) _1 CR A r, (149) 

where a is a fixed vector of the appropriate dimension and where we have used [15, p. 178, Exercise 9.9.3] and 
the chain rule in Lemma 1531 in (11481 ) and, [15, p. 180, Exercise 9.10.2] in (1 149b . With these expressions at hand 
and recalling that Hz = CRjvC t , the expression for the gradient Vc-FV(y) is equal to 

V c P Y (y) = E {p Y{x (y\X) (n z \y - X)(y - X) T R^ - R z x ) } CR^. (150) 

To complete the proof, we need to calculate the Hessian matrix, H y P Y {y). First consider the following two 
Jacobians 

D y (y - x) T K z \y - x) = 2(y - a) T R z 1 (151) 

D y R z 1 (y-x) = R- 1 , (152) 

which follow directly from [15, Section 9.9, Table 3] and [15, Section 9.12]. Now, from (11511 ). we can first obtain 
the Jacobian row vector D y Py(y) as 

D y P Y (y) = -E [p Ylx (y\X)(y - X) T R^} . (153) 

Recalling the expression in (11521 ) and that H y P Y (y) = D y (D^Py(y)) the Hessian matrix becomes 

H y P Y (y) = E{p Y{x (y\X) (R-^y - X)(y - X) T R Z X - R^)}. (154) 

By simple inspection from (11501 ) and (11541 ) the result in (11471 ) follows. ■ 
Lemma C.2: Let Y = GS + Z, where S is arbitrarily distributed and where Z is a zero-mean Gaussian random 
variable with covariance matrix Hz and independent of S. Then, Py{y) satisfies 

V G P Y (y) = -E{D T y P Yls (y\S)S T } . (155) 

Proof: First we write 

D y P Y]s (y\s) = -P Y]s (y\s)(y - Gs) T R z \ (156) 

where we have used (1151b . Now, we simply need to notice that 

V G P Yls (y\s) = P y | S (y| S )R- 1 (y - Gs)s T = -DlP Yls (y\s)s T , (157) 

where we have used Vq(j/ — Gs) T R^ 1 (y — Gs) = —2H z 1 (y — Gs)s T , which follows from [15, Section 9.9, 
Table 4]. Recalling that V G ^V(y) = E {V gPy\s{v\ s )} the result follows. ■ 
Lemma C.3: Let Y = GS + Z, where S is arbitrarily distributed and where Z is a zero-mean Gaussian random 
variable with covariance matrix Hz and independent of S. Then, 

D y E{S\y} = ^s(y)GK z 1 . (158) 
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Proof: 

D y E{S\y} = D y El [ S PY ^ ) S) } (159) 
f g iV(y)D l/ P y | S (y|5) - iV| S (y|S)D w i¥(y) ] 

r- f -Py\s(y\S)(y - GS)^ 1 + P Yls (y\S)(y -GE{S\ y})^ 1 \ 

= E \ My) J 

= (E{SS T | y) - E{S\ y} E{S T | y})G T R^ 1 . (162) 

Now, from the definition in (© the result in the lemma follows. Note that we have used the expression in (11561 ) 
for D y P Y \ s (y\S) and that from (1153b we can write 

D y P Y (y) = -E {p Ylx (y\X)(y - X) 7 ^ 1 } = -P Y (y)(y -GE{S\ y}) T R^. (163) 

■ 

Lemma C.4: Let Y = GS + Z, where S is arbitrarily distributed and where Z is a zero-mean Gaussian random 
variable with co variance matrix Hz and independent of S. Then, the Jacobian and Hessian of \ogP Y (y) satisfy 

D y \ogP Y (y) = (E{X\y}-y) T R z 1 (164) 
H y \ogP Y (y) = R^* x (y)Rz " R-z- (165) 
Proof: Recalling the expression in (1153b we can write 

1 

PrivY 

= -p^H P Y\x(y\X)(y - xV^z 1 } (167) 

= (E{X\y}-y) r K z 1 . (168) 

From the Jacobian expression, the Hessian can be computed as 

H y \ogP Y (y) = D y R z 1 (E{X\y}-y) (169) 
= R z 1 (GD y E{5|y}-I n ) (170) 
= R z 1 (G<My)G T R z 1 -I n ), (171) 

where the expression for D y E{S \ y} follows from Lemma |C3l ■ 
Lemma C.5: Let Y = GS + Z, where S is arbitrarily distributed (with i-th element denoted by Si) and where 
Z is a zero-mean Gaussian random variable with covariance matrix Rz and independent of S. Then, 

V G E{5, | y} = j^-r (E{5, 1 y} E{D T y P Yls (y\S)S T } - E{^D^ |s ( y |S)S T }) . 

Proof: The proof follows from this chain of equalities 

V G E{5 4 |y} = V G E{^^^} 

P Y (y)V G P Yls (y\S) - P Yls (y\S)X7 G P Y (y) 



D y \ogP Y (y) = 7r ^- T D y P y (y) (166) 



EiS 



Py{v) 

Dl P Yls (y\S)S T P Yls (y\S)E{DlP Y ls (y\S)S j y 
Pv(y) PrivY 



Z<Si\ - V T,, ' + J ^^r^ J > (172) 



^-E{S 2 DlP Yls (y\S)S^} + E {Si ^ffl } E{DjiY |s (y|S)S T } 



Pr(y) 

where (11721 ) follows from Lemma IC.2I and from (11571 ) 
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D. Proof of Theorem [7J 

Let us begin by considering the expression for the entries of the Jacobian of the vector vecJy, which is 

[DcvecJy]t+( 3 --i)n ) &+(l-i)n = D c fe! [Jy] i3 -, where throughout this proof {i,j,k} G [l,n] and I G [l,n']. From 
(fT3T > and (fT4l ) we have that the entries of the Fisher information matrix are given by 

[JH, = E{[IY(Y)U = -/ Pr(v f^ V) *V- W 
We now differentiate the expression above with respect to the entries of the matrix C and we get 

aiVM^W r # C i ag^ON (175) 



<9C fc z dyidyj J dyidyj \Py(y) dC M J 

where the interchange of the order of integration and differentiation can be justified from the Dominated Convergence 
Theorem following similar steps as in [1, Appendix B]. Now, using Lemma ICTl we transform the partial derivatives 
with respect to C into derivatives with respect to the entries of vector y, yielding 

Do„ [Jy]« = - / [H/rWCRNl /y dy 



I ^ y) <S*A-^ [HvPY(y)CKN]id ) Ay - <176) 



Expressing the elements of [H y Py(j/)CRjv]fci as the sum of the product of the elements of H y Py(y) and CR^y 
we get 



D H 1 - VfCR 1 ( [ d 2 PY(y)d*\ogP Y (y) 
Ckl [JYh -~2J CKnU [J 8y k dy r dy ldyj dy 



W)*('!^)*y). (177) 
oyidyj \P Y (y) oy k dy r J J 

We can now combine the integral identities (1221b and (12221 ) derived in Proposition IH.4I to rewrite the first term in 
the last equation as 

fl-iVM Mo sPY (y) Ay = f PYiy) f>°sM.y) dy (178) 



dy k dy r dyidyj J dyidyjdy k dy 

Now, applying a scalar version of the logarithm identity in (fT2l . 



1 d 2 P Y (y) d 2 \ogP Y (y) | d\ogP Y {y) d\ogP Y (y) 



Py{y) dy k dy r dy k dy r dy k dy 

the second term in the right hand side of (11771 ) becomes 

PY{y) a a „ , — 5 dj/ = / iV(») 5 o a a d ^ 



V-Pr(y) <9y fc %. / 7 dyidyjdy k dy 

+ fpy( v ) ( d2] °z p r(y) d2l °sPY(y) + a 2 iogp y (z/) a 2 io g p y (y) s ( 

7 V %<9yfc %<9y r dyjdy k 



+ fpyfy) ( d 3 \ogPY(y)d\ogP Y (y) + d 3 \ogP Y (y) d\ogP Y (y) \ ^ 
7 V dyidyjdy k dy r dyidyjdy r dy k J 

Using the regularity condition (12261 ) in Corollary IH.5I we finally obtain the desired result 

n ft 1 !n I :(^ L - 82[ °Z p r(y'> ,™ , \ d 2 logPy(y) , 

^ {y \^^>^ lCRNU )^w^- dv - <181) 
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Now, recalling that T Y (y) = H y \ogP Y (y) and identifying the elements of the two matrices ry(j/)CRjv and 
IV (y) with the terms in (11811 ). we obtain 

D Ct! [Jyjy = -E{[r y (Y)CR4 ; [ry(y)] jfc + [T Y (Y)CK N ] u [r Y (Y)] jk } . (182) 

Finally, taking into account that [DcvecJ Y ]i+(j-i) n ,k+(i-i)n = &c kl [Jv]y an d applying Lemma IA.1I with A = 
IY(F)CRat and B = T Y (Y) it can be easily shown that 

DcvecJy = -E {T Y (Y)CR N <g> T Y (Y) + K n (T Y (Y)CK N ® T Y (Y))} (183) 
= -2N n E{r v (r)CIW®iy(Y)}. (184) 

E. Proof of Theorem [2] 

Throughout this proof we assume that {i,j, 1} £ [l,m] and k G Now, let us begin by considering the 

expression for the entries of the matrix Eg: 

[E<4- = E{(Si -E{Si\ Y})(Sj - E{Sj \ Y})} = E{S i S j } - E{E{£ | F} E{^ | F}} . (185) 

Since the first term in last expression does not depend on G, we have that 

D Gfci [E s ] ij = -D Gkl E{E{S i \Y}E{S j \Y}} = --^Jp Y (y)E{S i \y}E{S j \y}dy 

= ~ j {Si | y} E {Sj | y } dy - / iV(y)^M E {S, | y} dy 

- J P Y {y)E{S i \y} ^l^ dy, (186) 

where, as in Appendix |Dj the justification of this interchange of the order of derivation and integration and two 
other interchanges below follow similar steps as in [1, Appendix B]. 

Note that the second and third terms in (1186b have the same structure and, thus, we will deal with them jointly. 
The first term in (11861) can be rewritten as 

"/ ^^ E {^|y}E{S' J |y}dy = J E U } E {ft \ y} E {Sj \ y} dy, (187) 

where we have used Lemma IC.2I to transform the derivative with respect to G into a derivative with respect to y . 
Using Lemma IC.5I the second term in (11861 ) can be computed as 

_/ ftW ^ E{ , l9} ^_/ E(s , l „ )E { s ,^}, Sllsrt 

+ j E iSiSi ^Qyf^ } E {Sj | y} dy. (188) 

Note that the third term in (11861 ) can be obtained by interchanging the roles of i and j in last equation. Plugging 
the expressions (1187b and (11881 ) into (11861 ) we can write 

Dg„ [E s ]y = - J E{S, | y] E Ig, gQa^!f> j E {S, | y} d» 

+ / E } E {S , | ,} d y + / E {SA 8 -^^} HS,\ V) dy (189, 

We now simplify the obtained expression. The first term can be reformulated as 

-y E |5 Z ^— |E{S'i|y}E{S' j |y}dy = -y ^- E {Si \ y\ E {5j | y} dy 

= -/^^^E { 5 l |y}E { 5,|y}dy 

/"p ( \f/ci i gEIA|y}E{^|y} 
= iV(y)E {S, | y} — dy, 
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where in the last step we have integrated by parts as detailed in Proposition |L2] We now make use of Lemma IC.3I 
to simplify the derivative inside the integration sign in last equation to obtain 

J PY{y)E{Si | y\ J - dy 

= I P Y (y)E{S l \y}(E{S 3 \y}[<£s(y)G T R z 1 ] ik + E{S l \y}[&s(y)G T R z 1 ] jk )dy. (190) 

We now proceed to the computation of the second and third terms in d 1 891 ) (note that they are in fact the same 
term with the roles of i and j interchanged). We have 

J E \ SiSl Wk j E ^l^^ = 7 di k *{Sj\v}dv (19D 

f dPy(y)E{S i S l \y} crc , 

= j oi k *{Sj\v}*v ( 192 ) 



/ 



P Y (y)E{S z S l | y} dE{ ^ y} dy, (193) 

where last equality follows by integrating by parts as Proposition II.2I We are now ready to apply Lemma |C3l again 
to obtain 

- J P Y (y)E{S i S l | y} 9E ^ V} dy = - j P Y (y)E{S i S l \y} [^ s (y)G T R z 1 } jk dy. (194) 

Plugging CESB and (TT941 into (fT89l and recalling that [&s(y)]ji = E {SjSi \y}-E {Sj \ y} E {Si \ y}, we finally 
have 

D Gfc! [Es]y = -| Pv(y)([*s(y)] J 7[*s(y)G T R z 1 ] ifc + [* s (y)]4* s (y)G T R z 1 ] jfe ) dy. (195) 

Taking into account that D Gk[ [Es]^ = [D G vecE s ] i+ ( j _ 1)n)A . + ( / _ 1)m and applying Lemma \AA\ with A = 
and B = ^^(i/jG 1 ^ 1 we obtain 

D G vecE s = -E{* S (Y) ® *s(^)G T R z 1 + K m * s (Y) ® *s(F)G T R z 1 } (196) 
= -2N m E{* s (F)®*s(l A )G T R z 1 }. (197) 

F. Proof of Theorem |4] 

The developments leading to the expressions for the Hessian matrices Hph(Y), Hnh(Y), and Hc^(^) follow 
a very similar pattern. Consequently, we will present only one of them here. 

Consider the Hessian Hph(Y), from the expression for the Jacobian Dph(Y) in (14TI ) it follows that 

H P h(Y) = Dp(Dlh(Y)) = Dpvec^R^HPEs (198) 

= (E s ® H T R^ 1 H) + (I m H T R z 1 HP)D m D P E S , (199) 

where in (11991 ) we have used Lemma IB.7I1I adding the matrix D m because Eg is a symmetric matrix. The final 
expression for Hph(Y) is obtained by plugging in (11991 ) the expression for DpEg obtained in Theorem [3] and 
recalling that D m D^ = N m . 

The calculation of the Hessian matrix Hn z h(Y) from its Jacobian D Rz /i(Y") in (l44l follows: 

H Rz /i(F) = -D Rz DT V ecJy = -D Rz DTD n vechJ v = ^D^D n D Rz Jy, (200) 

where, in last equality, we have used Lemma IB.7I2I Now, we only need to plug in the expression for D Rz J Y , 
which can be found in Theorem [3] and note that D^D n D+ = D^N n = D^. 

Finally, the Hessian matrix Hn N h(Y) can be computed from its Jacobian Dn N h(Y) in ( [451 as 

Hn„h(Y) = ~Dn N T>l vec(C T J y C) (201) 

= IdI (C t ® C T )D n D Rjv J y , (202) 

where we have used Lemmas |A.4| and |B.7|2| and also that D Rjv vecJy = D n D Rjv vechJy = D n D Rjv Jy similarly 
as in (12001 ) . Recalling the expression for D Rjv Jy in Theorem [3l the result follows. 
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G. Matrix algebra results for the proof of the multidimensional EPI in Theorem [7| 

In this appendix we present a number of lemmas and propositions that are used in the proof of our multidimen- 
sional EPI in Section M 

Lemma G.l (Bhatia [28, p. 15]): Let R G be a positive semidefinite matrix, R > 0. Then, 



R R 
R R 



Proof: Since R > 0, consider R = AA T and write 



R 


R 




A 


R 


R 




A 



> 0. 



[ A T A 1 



Lemma G.l (Bhatia [28, Exercise 1.3.10]): Let R G 

R I 

I, R 



s 

-1 



be a positive definite matrix, R > 0. Then, 
> 0. (203) 



Proof: Consider again R = AA T , then we have R 1 = A T A Now, simply write (12031 ) as 



R 


Is 




A 




Is Is " 




A T 




R 1 




A T 




Is Is _ 




A 1 



which, from Sylvester's law of inertia for congruent matrices [28, p. 5] and Lemma |G. 1 1 is positive semidefinite. 

■ 

Lemma G.3: If the matrices R and T are positive (semi)definite, then so is the product Rig) T. In other words, 
the class of positive (semi)definite matrices is closed under the Kronecker product. 

Proof: See [27, p. 254, Fact 7.4.15] ■ 

Corollary G.4 (Schur Theorem): The class of positive (semi)definite matrices is also closed under the Schur 
matrix product, RoT. 

Proof: The proof follows from Lemma lG3l by noting that the Schur product RoT is a principal submatrix of the 
Kronecker product R® T as in [27, Proposition 7.3.1] and that any principal submatrix of a positive (semi)definite 
matrix is also positive (semi)definite, [27, Propositions 8.2.6 and 8.2.7]. Alternatively, see [29, Theorem 7.5.3] or 
[26, Theorem 5.2.1] for a completely different proof. ■ 
Lemma G.5 (Schur complement): Let the matrices R G and T G S\ + be positive definite, R > and 
T > 0, and not necessarily of the same dimension. Then the following statements are equivalent 

1) [*t t ] > 0, 

2) T > A T R- 1 A, 

3) R > AT _1 A T , 

where A G W xr is any arbitrary matrix. 

Proof: See [29, Theorem 7.7.6] and the second exercise following it or [27, Propostition 8.2.3]. ■ 
With the above lemmas at hand, we are now ready to prove the following proposition: 



Proposition G.6: Consider two positive definite matrices R G 
it follows that 



and T G 



"++ 



R o T 1 > Diag(R) (R o T) 1 Diag(R). 
Proof: From Lemmas |G.1[ IG.21 and |G.4[ it follows that 

RoT Diag(R) 
Diag(R) RoT 1 



of the same dimension. Then 



(204) 



R 


R 




T I, 




R 


R 


o 


I s T 1 





> 0. 



Now, from Lemma IG.5I the result follows directly. 

Corollary G. 7: Let R G S^_ + be a positive definite matrix. Then, 

diag(R) T (R o R) 1 diag(R) < s. 



(205) 



2S 



Proof: Particularizing the result in Proposition IG.6I with T = R and pre- and post-multiplying it by 1 T and 
1 we obtain 

1 T (R o R- 1 ) 1 > l T Diag(R) (R o R)" 1 Diag(R)l. 

The result in ( 12051 ) now follows straightforwardly from the fact 1 T (R o R -T ) 1 = s, [30] (see also [27, Fact 
7.6.10], [26, Lemma 5.4.2(a)]). Note that R is symmetric and thus BJ = R and R T = R _1 . ■ 
Remark G.8: Note that the proof of Corollary IG.7 1 is based on the result of Proposition IG. 61 in (1204b . An alternative 
proof could follow similarly from a different inequality by Sty an in [31] 

RoR 1 +L, > 2(RoR)~ 1 , 

where, in this case, R is constrained to have ones in its main diagonal, i.e., R o I s = I s . 
Proposition G.9: Consider now the positive semidefinite matrix R G S s + . Then, 

diag(R)diag(R) T 

S 

Proof: For the case where R G S+ + is positive definite, from (12051 ) in Corollary IG.7I and Lemma IG.5I it 
follows that 



RoR diag(R) 
diag(R) T s 



> 0. 



Applying again Lemma IG.5I we get 



diag(R)diag(R) T 
RoR> ^—^ &v 1 . (206) 

s 

Now, assume that R G is positive semidefinite. We thus define e > and consider the positive definite matrix 
R + el s . From (12061 ). we know that 

(R + el s ) o (R + el s ) > diag(R + ^diagCR-H d,)^ 

s 

Taking the limit as e tends to 0, the validity of (12061 ) for positive semidefinite matrices follows from continuity. ■ 
The last lemma in this section follows. 

Lemma G.10: For a given random vector X, it follows that E {XX T } > E{X}E{X t }. 
Proof: Simply note that 

E{XX T } - E{X}E{X T } = E{(X - E{X})(X - E{X}) T } > 0, 

where last inequality follows from the fact that the expectation preserves positive semidefmiteness. ■ 



H. Integral identities involving functions and derivatives of Py{d)- 

The integral identities presented in this section are derived through a sequence of lemmas which lead to the main 
proposition containing the identities. 

First, we present a lemma, which is a straightforward generalization for non-white Gaussian random variables 
of [32, Lemma 4.1] 

Lemma H.l: Assume Y = X + Z is an n-dimensional random vector, where X is arbitrarily distributed and 
Z is distributed following a zero-mean Gaussian distribution with covariance Hz and consider a non-empty set of 
natural numbers T, whose elements range from 1 to n. Then, given ip > 1, there exists a finite positive constant k 
not depending on y such that 

< K (n,I,ifi,Kz)(PY(y)) 1/tp , (207) 
where we use the notation Yiiel ®Vi to denote, e.g., f\ie{3 l 3 5} ®V* = ^yidy^dys. 
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Proof: This proof follows the guidelines of the proof of [32, Lemma 4.1]. For any H z > 0, which implies 
that R^ 1 exists, we have that Py(y) is continuously differentiable in y, and 



dP Y (y) _ d 

dyi dy. 



t{P Y \ x {y\X)} = E^P z {y-X)^ 
f Px{x)P z {y - x)[{y - aj^R^.da;. 



(208) 
(209) 



Now, using Holder's inequality, for tp > 1 and 1/tp + 1/rp = 1 we have 
dPv(y) 



dyi 



< J (P x (x)P z (y-x)) 1 ^ (p x (x)P z (y-x)\[{y-x) J K 

< (P Y (y)) 1/ip (| Px{x)Pz{y - x)\[(y - x) T 



T-d-11 IV 1 
Z Jil 



i/V> 



da; 



-n 1^ 



l/V 



-n 



(210) 



from which the result for Z = {i}, such that \T\ = 1, follows since P z (y) \ [y T Tl z 
constant depending only on Ti z and ip = tp/(tp — 1). 

The inequalities for \T\ > 1 follow in a similar fashion from the fact that for any R z > 0, 

m/2 



dwp z (v) 



dy, 



m 



(-i) 1 



[R 



z 1 ] 



is bounded above by a 



(211) 



where Hm{x) is the |X|-th order Hermite polynomial defined following the convention in [33, p. 817], and noting 
that the partial derivatives of the Hermite polynomial are other polynomials. ■ 
Lemma H.2: Assume Y = X + Z is an n-dimensional random vector, where X is arbitrarily distributed and 
Z is distributed following a zero-mean Gaussian distribution with covariance H z . Then, given tp > 1, there exist 
a set of finite positive constants £ not depending on y such that for all \Z\ > 1 



awiogiyfo/) 



Yliei d Vi 



<^£(n,7r,^R z )(Py(y)) J 



(212) 



where the sum is over the partitions tt of the set I. 

Proof: We recall the Arbogast-Faa di Bruno's formula in its most general form as given in [34, Eq. (5)] for 
the partial derivative of a composite function, 



Uiei dz i 



E 



d 5 M 



n 



d \B\ 



9 



(213) 



where, as explained in [34], the sum is over the partitions tt of the set 1, and where B represents an element of 
the partition 7rJ! Thus, for each given partition tt, B can take \tt\ different values. Consequently the order of the 
derivative with respect to /, |-7r|, coincides with the number of factors in the product indexed by B. 
Particularizing (12131 ) for our case we obtain 



b 6 " Ujesdyj 



(214) 



Now, let us fix tp > 1 and H z > and apply the bound found in Lemma IhTTI to each factor Py ( y ) . Recalling 

1 1 j e b °yj 

that there are IttI of these factors, the bound becomes 



d\ x \\ogP Y {y) 



< Mi (Py(y))~ M II < n > B > v> K z )(p Y (y)) 



^2an,7r,p,n z )(P Y (y)) 



\n\/tp-\n\ 



(215) 
(216) 



6 Note that B is simply a set of indices. 
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where we have defined £(n, tt, <p, Rz) = I 71 "!' Ilse^ K ( n i B-> R-z) ■ 
Next, we present the lemma which is key in the proof of the proposition that contains the integral identities. 
Lemma H.3: Assume Y = X + Z is an n-dimensional random vector, where X is arbitrarily distributed and 
Z is distributed following a zero-mean Gaussian distribution with covariance R^. Let us consider a set u> whose 
elements X are sets of indices ranging from 1 to n. Then, given > 0, it follows that 



S^logiVCv) 



, H m Mv»* II -rr^f 



0. 



(217) 



where k is an arbitrary index for the entries of vector y. 

Proof: Applying Lemma IH.2I to each one of the individual factors inside the product in d217| ), yields, for any 
if > I, that 



< (iV(y))^nE^ ri ' 7r ( z )'^ R ^)( p ^^)) |7r(Z)l/< '" |7r(:r)l < 218) 

TT V f(n, vr(T), R z )(iy , (219) 



7r(X) 



where we have made explicit the dependence of the partition tt on the current value of the set of indices X. Now we 
consider a generic term £(n, 7r(Z), </?, R 2 )(Py (y))l 7r ( :r )l/^~l 7r ( ;r )l + ^/l w l and we note that for all <fi > 0, there exists 
a value (/? ma x(0> l 71 "^)!; M) > 1 such that the exponent \n(X)\/ip — \n(X)\ + <fr/\w\ is positive for any ip inside the 
interval (p G (1, 9?max(0, KPOIj M))- Note that, if <j) > KIM> then ^ mm (^, |tt(X)|, |cj|) = oo. Next, simply taking 



min V9 max (0, |7r(Z)|, 



(220) 



which fulfills that </? m i n (</>, M) > 1, we have that, for all 95 inside the non-empty interval (1, ip m - m ((/>, all the 
exponents of P Y (y) in (12191 ) are positive. Since we have that limi^j^oo P Y (y) = and the product and sum have 
a finite number of factors and terms, respectively, we readily obtain the result of the lemma. ■ 
With this last lemma at hand we are ready to prove the following proposition, which is the main purpose of this 
appendix. 

Proposition H.4: Assume Y = X + Z is an n-dimensional random vector, where X is arbitrarily distributed 
and Z is distributed following a zero-mean Gaussian distribution with covariance Hz- Then the following integral 
identities hold 



/ 



d 2 P Y (y) d 2 \ogP Y (y) 



I 



dykdyi dyidyj 
dP Y {y) d 3 \ogP Y (y) 



dy 
dy 



dy, 



3P Y {y) d 3 \ogP Y (y) 
dyi dyidyjdy k 
d 4 \ogP Y (y) d 

Y dyidyjdy k dyi 



(221) 



(222) 



dyi dyidyjdy k 

Proof: The proof is based in integrating by parts the left hand side of (1221l) - (1222b and showing that there is a 
term that vanishes. 

Integrating by parts the left hand side of (12211 ) we obtain 



d 2 P Y {y) d 2 \ogP Y (y) 

dykdyi dyidyj 



dy 



dP Y (y) d 2 \ogP Y (y) 



Vk=oo 



Vk=-oo 



dP Y (y) d 3 \ogP Y (y) 

dyi dyidyjdy k 



dy. 



(223) 



dyi dyidyj 

Casting Lemma |H. 1 1 with X = {1} to bound the first factor in the term inside the evaluation limits in last equation 
yields 



dP Y {y) d 2 \ogP Y (y) 



dyi dyidyj 



< K(n, {/},</?, R z ) 



i/v 



d 2 \ogP Y (y) 



dyidyj 



(224) 



According to Lemma |H31 with <p = l/ip and lo = {{i,j}}, the right hand side of last equation vanishes in the limit 
as |yjfc| - ► oo, which implies the identity in (122 lb . 

Repeating the procedure for (1222b . the resulting term when integrating by parts is 

03, ogiV ..M»=°° 



Pr(yy 



^Y{y) 

dyidyjdy k 



(225) 



Vr- 



31 



which is easily shown that it vanishes applying Lemma HOI with = 1 and u = {{i, j, k}}. ■ 
A simple corollary results from Proposition IH.4I 

Corollary H.5: Using that dP ^ y) = P Y (y) a ° g ^ iy) in the left hand side of (12221 . it readily follows that 

d\ogP Y (y) d 3 \ogP Y (y) dy _ j d A \ogP Y {y) & 

Y dyi dyidyjdy k J Y dyidyjdy k dyi 



d\ogP Y (Y) d 3 \ogP Y {Y) \ f d A \ogP Y (Y 



(227) 



dyi dyidyjdy k J \dyidyjdy k dyi 
which is a higher-dimensional version of the regularity condition 

E f glogPy (Y) d\ogP Y (Y) \ E f a 2 logP y (y) I 
I % <9y fc J 1 9y fc 9yi J ' 

which is used for example in the derivation of the CRLB in [14]. 

/. Integral identities involving functions of E {S \ y}. 

Similarly as in the previous appendix, the integral identities presented in this section are derived through a lemma 
which leads to the main proposition containing the identities. 

Lemma 1,1: Assume Y = GS + Z is an n-dimensional random vector, where G is a deterministic matrix, S 
is an arbitrarily distributed random vector, and Z is distributed following a zero-mean Gaussian distribution with 
covariance Hz- Consider a set of M functions fi(S), which have polynomial dependence on the elements of S. 
Then, given <p > 1, there exists a finite positive constant k not depending on y such that 

M 

PY{y)\[Wi{S)\y} 



M - <p ( M - 1 ) 

< K (n,{fi}, V ,K z )(Pr(y)) , (229) 



Proof: The proof follows by first noticing that 



\E{fi(S)\y}\<j^rJ \fi(s)\ P Y]s (y\s)P s (s)ds= p ^-rj \fi(8)\Pz(y-G8)Ps(s)da, 
and then using Holder's inequality with 1/ip + = 1 in an analogous way as we have done in (12101 ) we obtain 
J \Us)\P z (y-Gs)P s (s)ds = J (P s (s)Pz(y - Gs)) 1 ^ (p s (s)P z (y - Gs^f^s)^ ds 



< (iV(y)) 



i/if 



J P s {s)P z (y - Gs)\fi{s)\^ ds 



1 VV> 

(230) 



<C(n,fi,^Kz)(PY(y)) lh , (231) 

where last inequality follows from the fact that Pz(y — Gs)\fi(s)\^ is bounded above by the constant ^(n, i, cp, Hz) 
not depending on y due to the fact that fi(s) is a polynomial on the entries of s. 

Considering the product P Y (y) YlfLi E {Si \ Y} | the result of the lemma follows by noting that the new constant 

becomes k(u, {fi}, (p, H z ) = Ylfii fi: <P, ^z) ■ 
Proposition 1.2: Assume Y = GS + Z is an n-dimensional random vector, where G is a deterministic matrix, 
S is an arbitrarily distributed random vector, and Z is distributed following a zero-mean Gaussian distribution with 
covariance Hz- Then, the following integral identities hold 

j aPymW,\y) E{Sj ! y] dy _ _ I ^ y)E{SiSi , y} oeis^x dy (232) 



dyk 



dy 
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Proof: Integrating by parts the left hand side of d232| ) we have 

= [Py(y)E{SiSi | y}E{Sj \ y}}^^ ~ J i¥(g)E{ftfl | y} dE{ ^ V} dy. (233) 
Using LemmaOwith M = 2, f x {S) = SiSi, and f 2 (S) = S k , we have that 

|iV(y)E{5 i 5/|y}E{5 i |y}|<«(n,{/ < },^R z )(JV(y)) (2 - ,p)/,p . (234) 

Now choosing 1 < (/? < 2 it is easy to see that the first term in the right hand side of (12331 ) vanishes as 
lim^^oo Py{u) = 0. Proceeding similarly with the second integral identity with M = 3, f\(S) = Si, fi(S) = Si, 
and fi(S) = Sj and choosing 1 < ip < 3/2 the result in the lemma follows. ■ 
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